Open Research Data Projects

Projects funded in the framework of the ORD Program

The joint ORD program of ETH Zurich, EPFL and the four research institutes of the ETH Domain has financially supported more than 60 research projects in the period 2020โ€“2023. Funding supports researchers engaging in, or developing, ORD practices with and for their community and assists these researchers in becoming Open Research Data leaders in their field.

This page provides an overview of these projects. It highlights how researchers in the ETH Domain are currently applying ORD in exemplary ways. Some of the projects have already been completed, others are still in progress. The projects have been divided into three categories.

โ€œEstablishโ€ projects help link existing ORD practices to a research agenda to establish them on a broader basis. They contribute to a shared and comprehensive understanding of ORD practices that can then become de facto standards.

โ€œExploreโ€ projects are the most extensive ventures in the program and are designed to explore and test early-stage ORD practices. The goal is to map processes of what an ORD practice might look like and develop prototypes. Through these projects, new teams form across disciplines and institutions.

โ€œContributeโ€ projects help scientists integrate their research data into existing, often international, infrastructures. By standardizing the processes and making them generally accessible, the data are validated, and their potential is considerably expanded.

Filter

Category
Category
Institutions
MMS (Masonry MicroStructures database) - A 3D masonry microstructures database for advancing numerical research on irregular stone masonry structures

Category

Contribute

Institutions

EPFL

Data type

Microstructure database

Field

Materials Science

Researchers

Shah, Mati Ullah

Abstract

Stone masonry is an eco-friendly construction material, but its use has declined due to its vulnerability to earthquakes, mainly because of the poor arrangement of its microstructure. The microstructure includes the shape, size, and arrangement of stone units, which vary based on geographic, temporal, and material factors. Current building codes cannot fully account for this variability, and experimental studies are costly and impractical due to the diversity of masonry typologies. Numerical studies offer a solution, but creating realistic microstructures for modeling irregular stone masonry is complex and time-consuming. As a result, simplified microstructures are often used in simulations, which fail to capture the complexities of irregular masonry walls. To address this challenge, we have developed a 3D masonry microstructures database ready to use in numerical simulations. To enhance accessibility and usability, this project aims to create a web-based platform hosting this curated database of 3D microstructures and their geometric indices. The proposed web-based platform will also feature a tool for evaluating masonry quality using the Masonry Quality Index (MQI) from 2D images, promoting the preservation of historic structures and sustainable construction practices. Additionally, the platform will enable researchers to contribute and document new 3D microstructures, fostering collaboration and advancing numerical research on stone masonry.

Application Programming Interface for the River to Ocean Geodatabase for Education and Research

Category

Contribute

Institutions

ETH Zurich

Data type

Environnement

Field

Earth sciences

Researchers

Paradis, Sarah

Abstract

In order to advance our understanding of the carbon cycle, it is essential to evaluate the spatiotemporal variations of carbon between river and marine environments and gain insights into the pathways of carbon transfer from land to ocean. To do this, we need to work jointly with riverine and marine data, accounting for their temporal and spatial distribution. However, each of these systems have different data and metadata reporting strategies that need to be accounted for, which complicates their joint application. Efforts have been made to compile data from each of these systems into independent databases, but no attempt has yet been done to create a joint database of data of both of these systems while accounting for their different metadata. Hence, this project aims to bring together riverine and marine data into one database to easily query the data between both systems through the River to Ocean Geodatabase for Education and Research (ROGER). This database will be displayed in an interactive web-interface that queries riverine and/or marine data depending on the userโ€™s requirements through a REST API. Harnessing the advanced geographical functions of PostgreSQL, the REST API will include functions that allow users to geospatially integrate riverine and marine data. This new database will provide a crucial step forward in the understanding of the carbon cycle along the land-ocean continuum, while ensuring that the data complies with best Open Research Data practices.

Development of standardized Respiratory Open Access Research

Category

Contribute

Institutions

EPFL

Data type

Medical data

Field

Life sciences

Researchers

Dan, Jonathan

Abstract

Chronic cough is a common condition globally. While efforts are being made to develop wearables to detect and quantify cough events automatically, such monitoring devices have not yet been incorporated into routine clinical practice due to a lack of consistency in their validation, resulting in slow progress and a lack of trust in reported results. We have identified three main reasons for this heterogeneity: 1) the clinical definition of different cough events and especially the delimitation of their beginning/end lacks standardization, 2) the data used is typically private and imbalanced with inadequate labelling as a result of the previous point, and 3) methodologies to assess the accuracy of event detection are different between research groups and often inappropriate. This proposal builds on ORD datasets, community guidelines, and standards to propose a unified framework for validating cough event detection algorithms. The main objective is the development of standards that will unify the workflow for validating respiratory event detection algorithms to ensure data adheres the principles of Findable, Accessible, Interpretable, and Reusable data. This will be distributed through a website, serving as a central hub and reference for standardizing clinical definitions and methodologies, leading to a future benchmarking platform for respiratory event detection algorithms.

An ORD framework for synthetic mobility data

Category

Contribute

Institutions

ETH Zurich

Data type

Mobility data

Field

Urban studies

Researchers

Balac, Milos

Abstract

The eqasim pipeline, developed at ETH, is an ORD tool for generating synthetic travel demand datasets. However, its current implementation in Switzerland has several limitations. The Swiss implementation relies on non-open data from the Swiss Federal Office of Statistics, which complicates data sharing and slows research due to the need for lengthy data privacy agreements. This limits the speed of innovation and collaboration among researchers. Additionally, each new application of eqasim requires duplicating and modifying the codebase to incorporate new datasets, resulting in fragmented approaches. This project aims to overcome these limitations by enhancing the standardization of the eqasim framework, improving the ease of data exchange, and enabling better control over algorithms and data extraction. By addressing these issues, the project will streamline data sharing and accelerate research, ultimately increasing the impact of eqasim in Switzerland and beyond.

FAIRifying Urban Climate Modeling for Greening Cities

Category

Contribute

Institutions

EPFL

Data type

Climate

Field

Urban studies

Researchers

Manoli, Gabriele

Abstract

This project aims to translate the UT&C model to an open-source programming language which concurrently enables high computational efficiency and modularity. UT&C is a widely used urban climate model with a detailed vegetation scheme, thus being the perfect tool to inform urban greening strategies in cities around the world. However, the current code is written in a proprietary language and it is computationally heavy. By translating the UT&C code into Python and making it more open, FAIR, and user-friendly, this project will open up to new scientific opportunities (e.g., city-scale simulations, model coupling), facilitate a community-based development, and increase its accessibility to the broader urban climate and urban planning communities.

FAIRifying LรฉXPLORE: enhancing open research data pipelines for Advanced LaKe SciencE

Category

Contribute

Institutions

EPFL

Data type

Environnement

Field

Earth sciences

Researchers

Tofield-Pasche, Natacha

Abstract

LรฉXPLORE platform (LP) is as an innovative, open-water infrastructure on Lake Geneva, where multidisciplinary data are acquired at high frequency. Datalakes (DL) is a web-based open access data platform that provides, for LP, seven datasets in real-time. We have identified two challenges that could limit the FAIR approach to the data. This project aims at addressing them by: a) enhancing the robustness of the LP data transfer pipelines and b) enhancing DL data quality by prototyping a collaborative QA/QC Solution. The first objective is to strengthen the LP data pipeline by decoupling the data gathering from the data processing functions. The resulting simplified distribution of responsibilities will facilitate long-term maintenance and secure the system's long-term reliability. The second objective aims at designing a prototype of a collaborative QA/QC tool. For this, we will collaborate closely with the LP community, by organising two workshops and a QA/QC hackathon. The developed prototype will allow domain experts to efficiently assess and flag data quality issues, in order to improve accuracy and reliability of sensor data. This project will involve the operational LP team, the DL core developers, and research software engineers from ENAC-IT4R. The proposed improvements will help to strategically maintain and elevate LP's high scientific impacts in the long-term. DL will continue to guide the lake research community towards embracing Open Research Data practices.

Development and dissemination of Python packages for FAIR Data Acquisition

Category

Contribute

Institutions

EPFL

Data type

Data management

Field

Data management

Researchers

Jotzu, Gregor

Abstract

The generation of FAIR (Findable, Accessible, Interoperable & Reusable) data in experimental research is best supported by ensuring that the process of data acquisition is systematic and transparent, and the control and calibration of experiments is systematic and reproducible. This project aims to contribute to improving, expanding and disseminating (including training and threshold-lowering measures) a modular Python-based open-source software for data acquisition and experiment control which has emerged from a community of optics researchers, but can easily find a much wider user base eventually.

Open Processing of Airborne imaging Spectrometry

Category

Contribute

Institutions

EPFL

Data type

Georeferencing

Field

Earth sciences

Researchers

Skaloud, Jan

Abstract

Developed by NASA-JPL and operated by the consortium of Swiss universities - ARES, the AVIRIS-4 is the most advanced airborne imaging spectrometer (AIS) currently operational in Europe. In agreement with NASA- JPL, ARES will make all data produced by AVIRIS-4 publicly available and is building an environment of open tools to make the processing of this data accessible, interoperable and reproducible, in line with FAIR principles. During our previous ORD-Contribute project, OGAIS, we developed an open tool which can be used to label point correspondences in AIS images by hand. This tool enables the airborne remote sensing community to obtain ground truth โ€œtie-pointsโ€ for evaluating the quality of the scene reconstruction and/or improving the image geo-referencing accuracy. After the first flights of AVIRIS-4 during the spring-summer this year, a clear need emerged for the mission at high resolution (0.3 โ€“ 1 m/pixel) to obtain point correspondence labels without human intervention in order to ii) improve the conventional (direct) georeferencing, ii) automate the quality assessment on all current and future missions featuring more than couple flight-lines (>0.5 TB / mission). This project therefore proposes to support ARES ORD practices by providing tools to automate the detection of tie-points as spatial constraints in overlapping AVIRIS-4 images, and integrate them in EPFLโ€™s open, on-line georeferencing service ODYN to maximise findability.

Integrating and Enhancing Building Data for Advanced Research: NEST-Bot

Category

Contribute

Institutions

Empa

Data type

Data management

Field

Data management

Researchers

Heer, Philipp

Abstract

The built environment generates complex and heterogeneous data, categorized into 3 main types: structural and architectural information, performance data (time series of energy consumption, temperatures, or occupancy), and administrative records (contracts, costs). Despite the critical value of ORD in fostering scalable applications, significant challenges persist, including fragmented data storage, heterogeneity in standards, and inadequate metadata documentation, which complicates data contextualization and accessibility. This project, NEST-Bot, aims to address these challenges by enhancing data discoverability through an automated integration layer that populates a knowledge graph. NEST-Bot will train a LLM to serve as an intuitive interface for stakeholders-ranging from academic researchers and data scientists to HVAC engineers, architects, and automation experts-to access NEST-related data. A key aspect of the project involves the automatic generation of the integration layer from existing repositories, allowing the LLM to retrieve complex, heterogeneous datasets via natural language queries. This innovative approach aims to streamline data retrieval, enhance data quality, reduce redundancies, and make ORD practices more scalable and beneficial to the building sector. By linking and organizing diverse repositories, NEST-Bot will enable seamless interaction with complex datasets, establishing new standards for data integration and ORD in building automation and research.

Open WASH data by establishing Data Stewards and increasing FAIRness

Category

Explore

Institutions

ETH Zurich

Data type

Water, Sanitation and Hygiene Data

Field

Urban studies

Researchers

Tilley, Elizabeth

Abstract

We established the openwashdata community for the Water, Sanitation, and Hygiene (WASH) sector. We built infrastructure and communication channels, taught 100 WASH professionals the basics of data science, developed a workflow to publish WASH data following FAIR data principles, and mobilized those in the sector who were interested in joining our vision and mission. Our next step is establishing a data stewardship network, actively working with strategic partners in Malawi and South Africa by placing a fully-funded data steward within a research institute and a non-governmental organization. A newly developed 12-module "data stewardship for openwashdata'' training programme will develop data management strategies and help our partners to institutionalize ORD practices long-term within their organizations. We will also further invest in the openwashdata publishing arm of the community by increasing the FAIRness of our data, critically analyzing how to better address the details of all four components of FAIR: Findability, Accessibility, Interoperability, and Reusability. We will also set up a governance structure and sounding board to ensure the long-term sustainability of the community. Through our activities and active open communications channels, we expect to create a demand for data stewardship in the WASH sector, assess their role and define a profile for data stewards more generally.

Scroll to Top

Filter

Category
Category
Institutions